Finding Related Entities by Retrieving Relations: UIUC at TREC 2009 Entity Track
نویسندگان
چکیده
Our goal in participating in the TREC 2009 Entity Track was to study whether relation extraction techniques can help in improving accuracy of the entity finding task. Finding related entities is informational in nature and we wanted to explore if inducing structure on the queries helps satisfy this information need. The research outlook we took was to study techniques that retrieve relations between two entities from a large corpus, and from those, find the most relevant entities that participate in the given relation with another given entity. Instead of aiming at retrieving pages about specific entities, we tried to address the problem of directly finding the entities from the text. Our experimental results show that we were able to find many related entities using relation-based extraction, and ranking entities based on further evidence from the text helps to a certain extent. 1. PROBLEM FORMULATION The TREC 2009 Entity Track concentrated on finding related entities. Given an entity of focus, the nature of relation between this entity and other entities, and some information of the type of other entities, the goal was to find all the related entities. Instead of taking a search perspective looking for homepages of related entities, we wanted to explore the information seeking aspects of the query. Our general goal for the Entity track was to study the usefulness of information extraction techniques in improving the accuracy of the entity finding task. In particular, we focused on investigating whether we can improve accuracy by formulating such entity-finding queries as a relation query, which can be answered through relation extraction. We formulated the query, described by the narrative, as [entity relation entity] triplet, where relation and the first entity are given, and the type of the other entity is known. The task is to then find instances of the other entity that satisfy the relation. Such a formulation also reflects many other semantic search applications such as redacting (anonymizing) sensitive information, question answering, and intelligence gathering applications. This formulation was similar to a relation retrieval problem, explored earlier in [7]. A relation is assumed to be binary verb predicate over entities, where the entities can potentially have roles in which they participate in the relation. For example, the relation of touring a new city can be modeled as a person visiting a location. Here, the person always fills in the first slot and the location fills in the second. Special filters can additionally be applied on the two slots to restrict specific subsets of persons or locations (such as a city or a country). In this year’s task at TREC, one entity (usually the “first slot”) is deterministic and fixed and the second entity is restricted by the type of the named entity. The relation was specified in the narrative, or had to be derived from there. In the next sections, we describe our approach in detail. We start with an overall system architecture in section 2, detailing the core modules in sections 2.1, 2.2, and 2.3. We explain the parameters of our submitted runs in section 3 and briefly summarize our preliminary evaluation in section 4. Finally, we discuss some challenges we faced in section 5. 2. SYSTEM ARCHITECTURE Our basic approach can be summarized by the following stages: 1. Formulate a structured query based on the given entity and the relation expressed by the narrative. 2. Retrieve relevant snippets that match the relations specified in the query. 3. Identify named entities in the resultant snippets using state-of-the-art NE taggers (for persons and organizations) and product identifiers. 4. Rank retrieved entities and find homepages for the entities. We further extend this basic approach in two ways: • Before the retrieval step, we expand the relation expressed in the given query with semantically similar and related words, derived from WordNet and other linguistic resources such as distributional similarity; resulting in an expanded query for retrieval. • After retrieval, we explore techniques to improve the accuracy of extraction by searching for more relations that link the two entities in similar contexts, from the corpus. This is expected to help us improve our entity identification task and improve the relative ranking of relevant entities. The following sections explain the three steps in our model, viz. query formulation and retrieval, entity recognition, and entity filtering and re-ranking. Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number.
منابع مشابه
A Novel Framework for Related Entities Finding: ICTNET at TREC 2009 Entity Track
This paper addresses the problem of related entity finding, which was proposed in trec 2009. The overall aim of related entity finding (REF) is to perform entity-related search on Web data, which address common information needs that are not that well modeled as ad hoc document search. In this paper, a novel framework was proposed based on a probabilistic model for related entity finding in a W...
متن کاملA Journey in Entity Related Retrieval for TREC 2009
The focus of this paper is to present the results obtained as a result of performing entity information retrieval, namely the home pages of products, organizations and persons. The preliminary results, based on the Indri Search Engine, of this study and experimentation were presented at the Entity Track in TREC 2009. Indri Search Engine is an efficient and effective open source tool, which is o...
متن کاملTongKey at Entity Track TREC 2011: Related Entity Finding
This paper presents our work done for the TREC 2011 Entity track. A retrieval model was proposed for the task of related entity finding. This model consists of several parts: In order to get more accurate document collection, query analysis method was utilized to format the narrative of each query. Then, our dataset was generated by using ClueWeb09 API 2 . Moreover, we employed the NER tools an...
متن کاملPurdue at TREC 2010 Entity Track: A Probabilistic Framework for Matching Types Between Candidate and Target Entities
This paper gives an overview of our work for the TREC 2010 Entity track. The goal of the TREC Entity track is to study entity-related searches on Web data, which has not been sufficiently addressed in prior research. For both the Related Entity Finding (REF) task and the Entity List Completion (ELC) task in this track, we propose a unified probabilistic framework by incorporating the matching b...
متن کاملExperiments on Related Entity Finding Track at TREC 2009
Our goal in participating in the TREC 2009 Entity Track is to study whether QA list technique can help improve accuracy of the entity finding task. Also, we take a looking for homepage finding to identify homepages of an entity by training a maximum entropy classifier and a logistic regression models for three types of entity respectively.
متن کامل